A Hierarchical Consensus Architecture for Robust Document Clustering

نویسندگان

  • Xavier Sevillano
  • Germán Cobo
  • Francesc Alías
  • Joan Claudi Socoró
چکیده

A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique for a given clustering problem. As a step towards building robust document partitioning systems, we present a strategy based on a hierarchical consensus clustering architecture that operates on a wide diversity of document representations and partitions. The conducted experiments show that the proposed method is capable of yielding a consensus clustering that is comparable to the best individual clustering available even in the presence of a large number of poor individual labelings, thus largely avoiding the problems derived from selecting non-optimal clustering configurations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

MOTIVATION Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult t...

متن کامل

Robust Method for E-Maximization and Hierarchical Clustering of Image Classification

We developed a new semi-supervised EM-like algorithm that is given the set of objects present in eachtraining image, but does not know which regions correspond to which objects. We have tested thealgorithm on a dataset of 860 hand-labeled color images using only color and texture features, and theresults show that our EM variant is able to break the symmetry in the initial solution. We compared...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics

This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...

متن کامل

Hierarchical self-refining consensus architectures and soft consensus functions for robust multimedia clustering Arquitecturas de consenso jerárquicas auto-refinables y funciones difusas de consenso para agrupamiento robusto de datos multimedia

PhD thesis in Information and Communication Technologies and their Management. The author was examined in June 2009 in Barcelona by the following committee: Paolo Rosso (Univ. Politécnica de Valencia), Aristides Gionis (Yahoo! Research), Juan José Rodŕıguez (Univ. de Burgos), Jordi Turmo (Univ. Politécnica de Catalunya) and Ester Bernadó (La Salle Univ. Ramon Lull). The grade obtained was Summa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007